Data

First, we will read in the MISR datasets which have been matched to the AQS and CSN datasets. These data were matched spatially by considering every AQS/CSN data collection site within a 2.2 km radius of a MISR data pixel, and these matches were further filtered by matching these observations based on the dates when they were recorded.

We will also slightly alter these datasets, by changing the way that dates are stored in the data. Instead of storing dates as one object in a YYYY-MM-DD format, we will instead store the day, month, and year as three separate attributes.

In addition to the data which we collected from the CSN dataset, we will also use a formula to estimate the total dust mass in a given area, based on the presence of certain elements.

The formula for computing dust mass is given by \(\text{Dust Mass} = 2.2\times\text{Al} + 2.49\times\text{Si} + 1.63\times\text{Ca} + 1.94\times\text{Ti} + 2.42\times\text{Fe}\).

Exploratory Data Analysis

First, we will do some exploratory data analysis for these datasets, so we can have a better understanding of the data which we collected.

Numerical Summaries

We will examine some brief numerical summaries of our main four variables, in order to know more about their general distributions.

Numerical Summaries of the variables we want to predict
Minimum 25th percentile Median 75th percentile Maximum IQR Range Mean Standard Deviation Present Values Missing Values
Dust Mass -0.0284 0.3503 0.6126 1.0773 18.3301 0.7270 18.3584 0.9065 1.0361 5115 174
Nitrate 0.0000 0.6300 1.4000 3.4400 53.9000 2.8100 53.9000 3.0576 4.6030 5073 216
Sulfate 0.0000 0.5640 0.9943 1.6800 10.7000 1.1160 10.7000 1.3069 1.1535 5094 195
PM2.5 -7.2000 5.0000 8.0833 12.4583 529.4167 7.4583 536.6167 10.4596 10.6680 157005 0

Based on the table above, we see that there are a few negative values recorded for Dust Mass and PM2.5 concentrations. As a concentration must be strictly non-negative (as we cannot have negative amounts of a particle), we will replace all negative values with 0.

Histograms

In addition to examining numerical summaries of these values, we will also examine histograms of the values to see the overall distributions of these variables in a more visual manner.

From the plots above, we see that the distributions for each of these four variables are all somewhat right-skewed, as there are quite a few high-valued outliers in these datasets, and there are not a corresponding amount of low values in these data, as these data are all strictly non-negative.

The log-plots above all appear to be relatively symmetrical and look somewhat like Normal distributions, which may be helpful for model fitting and prediction purposes, as these distributions are significantly less skewed by their few large values.

Historical Data

In addition to the histograms which show the general distributions of these data over our 22-year period, we have created time series plots of dust mass, nitrate, PM2.5, and sulfate concentrations in California over time as a way to visualize how these quantities have changed over time.

The PM2.5 data is sourced from the AQS data collection sites, whereas the dust mass, nitrate, and sulfate concentrations come from the CSN datasets.

Monthly Dust Mass Concentrations in California Monthly Nitrate Concentrations in California

Monthly PM2.5 Concentrations in California

Monthly Dust Mass Concentrations in California

Finding Missing Values

To start off, we will examine counts of missing values in our datasets, to determine how much of the data which we aim to use is actually present in the dataset.

Missing Values in the AQS Dataset

Counts of Variables in the merged MISR and AQS dataset
Variable Name Recorded Values Missing Values
PM25 157005 0
Year 157005 0
Month 157005 0
Day 157005 0
Site.Latitude 157005 0
Site.Longitude 157005 0
elevation 157005 0
pixel.latitude 157005 0
pixel.longitude 157005 0
AOD 50089 106916
AOD_uncertainty 50089 106916
angstrom_exp_550_860 50089 106916
AOD_absorption 50089 106916
AOD_nonspherical 50089 106916
small_mode_AOD 50089 106916
medium_mode_AOD 50089 106916
large_mode_AOD 50089 106916
aod_mix_01 57964 99041
aod_mix_02 58151 98854
aod_mix_03 58387 98618
aod_mix_04 58675 98330
aod_mix_05 58870 98135
aod_mix_06 59079 97926
aod_mix_07 59220 97785
aod_mix_08 59105 97900
aod_mix_09 58027 98978
aod_mix_10 54411 102594
aod_mix_11 63009 93996
aod_mix_12 62969 94036
aod_mix_13 62990 94015
aod_mix_14 62899 94106
aod_mix_15 62332 94673
aod_mix_16 61316 95689
aod_mix_17 59339 97666
aod_mix_18 56133 100872
aod_mix_19 51824 105181
aod_mix_20 46472 110533
aod_mix_21 49385 107620
aod_mix_22 49028 107977
aod_mix_23 48577 108428
aod_mix_24 47605 109400
aod_mix_25 46377 110628
aod_mix_26 45070 111935
aod_mix_27 43693 113312
aod_mix_28 42072 114933
aod_mix_29 40508 116497
aod_mix_30 38868 118137
aod_mix_31 61811 95194
aod_mix_32 61716 95289
aod_mix_33 61529 95476
aod_mix_34 60895 96110
aod_mix_35 60081 96924
aod_mix_36 58316 98689
aod_mix_37 55627 101378
aod_mix_38 52266 104739
aod_mix_39 48168 108837
aod_mix_40 43772 113233
aod_mix_41 53302 103703
aod_mix_42 53203 103802
aod_mix_43 53113 103892
aod_mix_44 52600 104405
aod_mix_45 51719 105286
aod_mix_46 50327 106678
aod_mix_47 48476 108529
aod_mix_48 46027 110978
aod_mix_49 43313 113692
aod_mix_50 40558 116447
aod_mix_51 60791 96214
aod_mix_52 54792 102213
aod_mix_53 38906 118099
aod_mix_54 51407 105598
aod_mix_55 43584 113421
aod_mix_56 33043 123962
aod_mix_57 38243 118762
aod_mix_58 33797 123208
aod_mix_59 29758 127247
aod_mix_60 29972 127033
aod_mix_61 29164 127841
aod_mix_62 28486 128519
aod_mix_63 38842 118163
aod_mix_64 37758 119247
aod_mix_65 36746 120259
aod_mix_66 35872 121133
aod_mix_67 29471 127534
aod_mix_68 28945 128060
aod_mix_69 28730 128275
aod_mix_70 28666 128339
aod_mix_71 28061 128944
aod_mix_72 28061 128944
aod_mix_73 28068 128937
aod_mix_74 28088 128917

First, we notice that there are no missing values for PM2.5, the date-related variables, or the spatial variables. This is excellent, as these values are the most important predictors for the models which we will fit.

We can also notice that there are the same amount of recorded and missing values for each of the 8 AOD variables. If we examine these 8 variables further, we find that they are a “package deal”; for each observation, there is either a recorded value for all 8 of these variables, or a missing value for all 8 variables.

Unfortunately, the same cannot be said for the 74 AOD mixture variables. From the table above, we can clearly see that the number of available observations varies for each of the 74 mixtures. However, of these 74 mixtures, the mixtures with the fewest number of recorded observations (aod_mix_71 and aod_mix_72) each have 36746 recorded values. Furthermore, a table containing all 74 mixtures would have 20604 observations which have a recorded value for each of the 74 mixtures, which is a fair amount of data to work with.

Missing Values in the CSN Dataset

Counts of Variables in the merged MISR and CSN dataset
Variable Name Recorded Values Missing Values
nitrate 5073 216
sulfate 5094 195
dust 5115 174
Year 5289 0
Month 5289 0
Day 5289 0
Site.Latitude 5289 0
Site.Longitude 5289 0
Mean.Temp 4903 386
Min.Temp 4009 1280
Max.Temp 4006 1283
Atm.Press 4891 398
elevation 5289 0
pixel.latitude 5289 0
pixel.longitude 5289 0
AOD 1709 3580
AOD_uncertainty 1709 3580
angstrom_exp_550_860 1709 3580
AOD_absorption 1709 3580
AOD_nonspherical 1709 3580
small_mode_AOD 1709 3580
medium_mode_AOD 1709 3580
large_mode_AOD 1709 3580
aod_mix_01 1936 3353
aod_mix_02 1944 3345
aod_mix_03 1949 3340
aod_mix_04 1959 3330
aod_mix_05 1970 3319
aod_mix_06 1987 3302
aod_mix_07 1993 3296
aod_mix_08 1985 3304
aod_mix_09 1961 3328
aod_mix_10 1811 3478
aod_mix_11 2201 3088
aod_mix_12 2202 3087
aod_mix_13 2188 3101
aod_mix_14 2168 3121
aod_mix_15 2163 3126
aod_mix_16 2147 3142
aod_mix_17 2069 3220
aod_mix_18 1931 3358
aod_mix_19 1758 3531
aod_mix_20 1518 3771
aod_mix_21 1672 3617
aod_mix_22 1654 3635
aod_mix_23 1642 3647
aod_mix_24 1587 3702
aod_mix_25 1528 3761
aod_mix_26 1457 3832
aod_mix_27 1410 3879
aod_mix_28 1343 3946
aod_mix_29 1282 4007
aod_mix_30 1214 4075
aod_mix_31 2130 3159
aod_mix_32 2130 3159
aod_mix_33 2136 3153
aod_mix_34 2122 3167
aod_mix_35 2090 3199
aod_mix_36 2018 3271
aod_mix_37 1915 3374
aod_mix_38 1747 3542
aod_mix_39 1582 3707
aod_mix_40 1405 3884
aod_mix_41 1773 3516
aod_mix_42 1766 3523
aod_mix_43 1760 3529
aod_mix_44 1731 3558
aod_mix_45 1713 3576
aod_mix_46 1653 3636
aod_mix_47 1592 3697
aod_mix_48 1506 3783
aod_mix_49 1398 3891
aod_mix_50 1291 3998
aod_mix_51 2130 3159
aod_mix_52 1886 3403
aod_mix_53 1214 4075
aod_mix_54 1739 3550
aod_mix_55 1400 3889
aod_mix_56 1016 4273
aod_mix_57 1195 4094
aod_mix_58 1036 4253
aod_mix_59 919 4370
aod_mix_60 924 4365
aod_mix_61 894 4395
aod_mix_62 875 4414
aod_mix_63 1216 4073
aod_mix_64 1175 4114
aod_mix_65 1134 4155
aod_mix_66 1101 4188
aod_mix_67 905 4384
aod_mix_68 890 4399
aod_mix_69 887 4402
aod_mix_70 886 4403
aod_mix_71 867 4422
aod_mix_72 867 4422
aod_mix_73 867 4422
aod_mix_74 867 4422

Charts and Graphs

Next, we will create some charts and plots of the matched MISR data, to get visual representations of the data which we have collected.

First, we will create a “correlation heatmap” to visually depict the correlations between the 74 AOD mixtures which were collected in the MISR data. In the correlation heatmap shown below, the correlations between these different mixtures are measured from -1 to 1, and each square in the heatmap is coloured in, with it’s colour and intensity proportional to the correlation between the variables.

Correlation Heatmap for the 74 MISR Mixtures

As we can clearly see in the correlation heatmap displayed above, the 74 AOD mixtures in the collected MISR data are all strongly correlated with one another, as the entire heatmap is green.

In fact, the weakest correlation between a pair of these 74 AOD mixtures is 0.681, which is the correlation between aod_mix_01 and aod_mix_44, which is still considered to be a strong positive linear relationship between two variables.

Model Fitting

Next, we will test a variety of different model fitting techniques on our dataset in order to determine which models are generally more efficient and serve as better models to make predictions for our dataset.

We will create a whole host of different models, as we have multiple different values in these two datasets which we want to predict, and there are multiple different sets of predictors which we aim to incorporate.

The 6 main values which we want to predict are; PM2.5, \(\text{SO}_{4}^{2-}\) (sulfate), \(\text{NO}_{3}^{-}\) (nitrate), dust mass, elemental carbon, and organic carbon. The two primary sets of predictors which we want to use are the 8 measured AOD values, and the 74 MISR AOD mixtures.

In addition to these two sets of predictors mentioned above, we will also introduce a “Months” variable to help account for the changes in these values over time. The Months variable will be computed by determining how many months it has been since March 2000, as March 2000 is the starting point for the data which we have collected.

To fit our models, we will first remove any observations which have unrecorded values for the desired predictors. Then, we will split the data into a training dataset, a validation dataset, and a test dataset using a 70/15/15 ratio.

PM2.5 AOD XGBoost

Using the xgboost package, we will fit an an XGBooster model (eXtreme Gradient Booster) which predicts PM2.5 concentrations based on the 8 AOD parameters.

To fit this model, we will tune the hyperparameters of the XGBooster (using the caret package) by fitting XGB models with different combinations of our potential hyperparameters on the validation dataset, and we will select the hyperparameters corresponding to the model(s) which had the highest R-squared and the lowest RMSE on the validation dataset as candidate models to use on the test set.

Model Performance of XGBoost models on the validation dataset
Model: Predicting PM2.5 Concentration using AOD Parameters
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.1 10 0.01 0.50 0 0.50 5.298538 0.6734170
100 0.3 10 0.01 0.50 0 0.50 5.370289 0.6644866
100 0.6 10 0.01 0.50 0 0.50 5.772357 0.6311045
100 1.0 10 0.01 0.50 0 0.50 7.608539 0.4900470
100 0.1 10 0.01 0.75 0 0.50 5.255159 0.6788827
100 0.3 10 0.01 0.75 0 0.50 5.107286 0.6958214
100 0.6 10 0.01 0.75 0 0.50 5.713991 0.6365136
100 1.0 10 0.01 0.75 0 0.50 7.245106 0.5094237
100 0.1 10 0.01 1.00 0 0.50 5.314599 0.6718468
100 0.3 10 0.01 1.00 0 0.50 5.210123 0.6847515
100 0.6 10 0.01 1.00 0 0.50 5.721072 0.6385597
100 1.0 10 0.01 1.00 0 0.50 7.628993 0.4844245
100 0.1 10 0.01 0.50 1 0.50 5.247074 0.6819015
100 0.3 10 0.01 0.50 1 0.50 5.147833 0.6908096
100 0.6 10 0.01 0.50 1 0.50 5.997632 0.6075079
100 1.0 10 0.01 0.50 1 0.50 7.347219 0.4988769
100 0.1 10 0.01 0.75 1 0.50 5.345217 0.6670007
100 0.3 10 0.01 0.75 1 0.50 5.147426 0.6907085
100 0.6 10 0.01 0.75 1 0.50 5.817197 0.6278098
100 1.0 10 0.01 0.75 1 0.50 7.453457 0.4904738
100 0.1 10 0.01 1.00 1 0.50 5.277627 0.6755943
100 0.3 10 0.01 1.00 1 0.50 5.196682 0.6874887
100 0.6 10 0.01 1.00 1 0.50 5.900337 0.6175367
100 1.0 10 0.01 1.00 1 0.50 7.315064 0.5027216
100 0.1 10 0.01 0.50 0 0.75 5.062321 0.7037203
100 0.3 10 0.01 0.50 0 0.75 5.220967 0.6824001
100 0.6 10 0.01 0.50 0 0.75 5.346774 0.6728348
100 1.0 10 0.01 0.50 0 0.75 6.395109 0.5778734
100 0.1 10 0.01 0.75 0 0.75 5.161592 0.6908937
100 0.3 10 0.01 0.75 0 0.75 4.976497 0.7106034
100 0.6 10 0.01 0.75 0 0.75 5.240580 0.6860804
100 1.0 10 0.01 0.75 0 0.75 6.178190 0.6029167
100 0.1 10 0.01 1.00 0 0.75 5.171477 0.6884230
100 0.3 10 0.01 1.00 0 0.75 4.951624 0.7140881
100 0.6 10 0.01 1.00 0 0.75 5.256776 0.6861548
100 1.0 10 0.01 1.00 0 0.75 5.914632 0.6213765
100 0.1 10 0.01 0.50 1 0.75 5.060289 0.7037068
100 0.3 10 0.01 0.50 1 0.75 5.224634 0.6829331
100 0.6 10 0.01 0.50 1 0.75 5.376495 0.6674979
100 1.0 10 0.01 0.50 1 0.75 6.333449 0.5878078
100 0.1 10 0.01 0.75 1 0.75 5.128064 0.6941618
100 0.3 10 0.01 0.75 1 0.75 4.917102 0.7172861
100 0.6 10 0.01 0.75 1 0.75 5.265306 0.6847760
100 1.0 10 0.01 0.75 1 0.75 6.332537 0.5801943
100 0.1 10 0.01 1.00 1 0.75 5.101171 0.6972611
100 0.3 10 0.01 1.00 1 0.75 4.952497 0.7134103
100 0.6 10 0.01 1.00 1 0.75 5.547864 0.6536705
100 1.0 10 0.01 1.00 1 0.75 6.294692 0.5929423
100 0.1 10 0.01 0.50 0 1.00 5.227905 0.6829589
100 0.3 10 0.01 0.50 0 1.00 4.993494 0.7087367
100 0.6 10 0.01 0.50 0 1.00 5.123250 0.6970568
100 1.0 10 0.01 0.50 0 1.00 5.802983 0.6305182
100 0.1 10 0.01 0.75 0 1.00 5.214809 0.6834389
100 0.3 10 0.01 0.75 0 1.00 4.932750 0.7154913
100 0.6 10 0.01 0.75 0 1.00 5.240126 0.6858164
100 1.0 10 0.01 0.75 0 1.00 6.238413 0.5902703
100 0.1 10 0.01 1.00 0 1.00 5.089133 0.6988379
100 0.3 10 0.01 1.00 0 1.00 4.896197 0.7196612
100 0.6 10 0.01 1.00 0 1.00 4.949887 0.7162795
100 1.0 10 0.01 1.00 0 1.00 5.765821 0.6378066
100 0.1 10 0.01 0.50 1 1.00 5.099909 0.7002502
100 0.3 10 0.01 0.50 1 1.00 5.108513 0.6951538
100 0.6 10 0.01 0.50 1 1.00 5.302937 0.6787903
100 1.0 10 0.01 0.50 1 1.00 6.141106 0.6086367
100 0.1 10 0.01 0.75 1 1.00 5.204699 0.6844079
100 0.3 10 0.01 0.75 1 1.00 5.047552 0.7022484
100 0.6 10 0.01 0.75 1 1.00 5.215667 0.6884219
100 1.0 10 0.01 0.75 1 1.00 6.616896 0.5668439
100 0.1 10 0.01 1.00 1 1.00 5.089133 0.6988379
100 0.3 10 0.01 1.00 1 1.00 4.896197 0.7196612
100 0.6 10 0.01 1.00 1 1.00 4.949887 0.7162795
100 1.0 10 0.01 1.00 1 1.00 5.765821 0.6378066
Best-Performing Model(s) on the validation dataset
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.3 10 0.01 1 0 1 4.896197 0.7196612
100 0.3 10 0.01 1 1 1 4.896197 0.7196612

Sulfate AOD XGBoost

Model Performance of XGBoost models on the validation dataset
Model: Predicting Sulfate Ion Concentration using AOD Parameters
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.1 10 0.01 0.50 0 0.50 0.8706902 0.6728096
100 0.3 10 0.01 0.50 0 0.50 0.9233408 0.6132309
100 0.6 10 0.01 0.50 0 0.50 1.0757872 0.5007694
100 1.0 10 0.01 0.50 0 0.50 1.5238341 0.3702790
100 0.1 10 0.01 0.75 0 0.50 0.8061366 0.7224187
100 0.3 10 0.01 0.75 0 0.50 0.9754112 0.5785846
100 0.6 10 0.01 0.75 0 0.50 1.0579587 0.5079448
100 1.0 10 0.01 0.75 0 0.50 1.7881154 0.1531747
100 0.1 10 0.01 1.00 0 0.50 0.8201459 0.7103405
100 0.3 10 0.01 1.00 0 0.50 0.8768430 0.6544411
100 0.6 10 0.01 1.00 0 0.50 1.0039449 0.5692425
100 1.0 10 0.01 1.00 0 0.50 1.4462495 0.2813298
100 0.1 10 0.01 0.50 1 0.50 0.8834390 0.6651661
100 0.3 10 0.01 0.50 1 0.50 0.8461694 0.6819181
100 0.6 10 0.01 0.50 1 0.50 1.0983460 0.4730811
100 1.0 10 0.01 0.50 1 0.50 1.6548701 0.2772234
100 0.1 10 0.01 0.75 1 0.50 0.8339869 0.6972464
100 0.3 10 0.01 0.75 1 0.50 0.9009769 0.6352409
100 0.6 10 0.01 0.75 1 0.50 1.2738477 0.3389138
100 1.0 10 0.01 0.75 1 0.50 1.6011047 0.2395892
100 0.1 10 0.01 1.00 1 0.50 0.8781957 0.6566053
100 0.3 10 0.01 1.00 1 0.50 0.9085490 0.6263576
100 0.6 10 0.01 1.00 1 0.50 1.0230735 0.5536874
100 1.0 10 0.01 1.00 1 0.50 1.4645529 0.3289649
100 0.1 10 0.01 0.50 0 0.75 0.8986461 0.6558077
100 0.3 10 0.01 0.50 0 0.75 0.8970430 0.6420132
100 0.6 10 0.01 0.50 0 0.75 1.1719560 0.3971055
100 1.0 10 0.01 0.50 0 0.75 1.1742259 0.4259074
100 0.1 10 0.01 0.75 0 0.75 0.8337276 0.7065406
100 0.3 10 0.01 0.75 0 0.75 0.8743562 0.6643169
100 0.6 10 0.01 0.75 0 0.75 1.0453291 0.5166597
100 1.0 10 0.01 0.75 0 0.75 1.1529634 0.4721355
100 0.1 10 0.01 1.00 0 0.75 0.8795913 0.6636060
100 0.3 10 0.01 1.00 0 0.75 0.8983519 0.6438571
100 0.6 10 0.01 1.00 0 0.75 0.8945099 0.6452487
100 1.0 10 0.01 1.00 0 0.75 1.1035688 0.5102339
100 0.1 10 0.01 0.50 1 0.75 0.8742027 0.6768209
100 0.3 10 0.01 0.50 1 0.75 0.9145261 0.6243414
100 0.6 10 0.01 0.50 1 0.75 0.9523275 0.5987274
100 1.0 10 0.01 0.50 1 0.75 1.2028539 0.4555982
100 0.1 10 0.01 0.75 1 0.75 0.8531060 0.7002003
100 0.3 10 0.01 0.75 1 0.75 0.9100659 0.6519024
100 0.6 10 0.01 0.75 1 0.75 0.9401201 0.6067033
100 1.0 10 0.01 0.75 1 0.75 1.1548085 0.4622089
100 0.1 10 0.01 1.00 1 0.75 0.8435438 0.6893954
100 0.3 10 0.01 1.00 1 0.75 0.9220536 0.6220539
100 0.6 10 0.01 1.00 1 0.75 0.9965925 0.5584220
100 1.0 10 0.01 1.00 1 0.75 1.0843384 0.4987373
100 0.1 10 0.01 0.50 0 1.00 0.8680757 0.6786480
100 0.3 10 0.01 0.50 0 1.00 0.9144634 0.6342408
100 0.6 10 0.01 0.50 0 1.00 0.9966964 0.5508666
100 1.0 10 0.01 0.50 0 1.00 1.1674735 0.3961598
100 0.1 10 0.01 0.75 0 1.00 0.8971266 0.6567177
100 0.3 10 0.01 0.75 0 1.00 0.9304139 0.6305538
100 0.6 10 0.01 0.75 0 1.00 1.0099378 0.5449799
100 1.0 10 0.01 0.75 0 1.00 1.2356653 0.3567881
100 0.1 10 0.01 1.00 0 1.00 0.8901790 0.6568077
100 0.3 10 0.01 1.00 0 1.00 0.8845404 0.6569163
100 0.6 10 0.01 1.00 0 1.00 1.0121973 0.5426659
100 1.0 10 0.01 1.00 0 1.00 1.0295449 0.5291270
100 0.1 10 0.01 0.50 1 1.00 0.8573550 0.6868568
100 0.3 10 0.01 0.50 1 1.00 0.9255808 0.6296975
100 0.6 10 0.01 0.50 1 1.00 0.9149576 0.6264332
100 1.0 10 0.01 0.50 1 1.00 1.1079909 0.4594603
100 0.1 10 0.01 0.75 1 1.00 0.8865054 0.6707632
100 0.3 10 0.01 0.75 1 1.00 0.8731268 0.6637978
100 0.6 10 0.01 0.75 1 1.00 0.9832748 0.5669270
100 1.0 10 0.01 0.75 1 1.00 1.0210319 0.5468244
100 0.1 10 0.01 1.00 1 1.00 0.8901790 0.6568077
100 0.3 10 0.01 1.00 1 1.00 0.8845404 0.6569163
100 0.6 10 0.01 1.00 1 1.00 1.0121973 0.5426659
100 1.0 10 0.01 1.00 1 1.00 1.0295449 0.5291270
Best-Performing Model(s) on the validation dataset
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.1 10 0.01 0.75 0 0.5 0.8061366 0.7224187

Nitrate AOD XGBoost

Model Performance of XGBoost models on the validation dataset
Model: Predicting Nitrate Ion Concentration using AOD Parameters
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.1 10 0.01 0.50 0 0.50 2.804656 0.4852618
100 0.3 10 0.01 0.50 0 0.50 3.026486 0.4389449
100 0.6 10 0.01 0.50 0 0.50 3.034325 0.4446241
100 1.0 10 0.01 0.50 0 0.50 5.073262 0.2120794
100 0.1 10 0.01 0.75 0 0.50 2.853184 0.4797394
100 0.3 10 0.01 0.75 0 0.50 3.189455 0.3753539
100 0.6 10 0.01 0.75 0 0.50 3.458619 0.2805441
100 1.0 10 0.01 0.75 0 0.50 4.849888 0.2020066
100 0.1 10 0.01 1.00 0 0.50 2.888169 0.4623542
100 0.3 10 0.01 1.00 0 0.50 3.116762 0.4178904
100 0.6 10 0.01 1.00 0 0.50 3.769652 0.2997668
100 1.0 10 0.01 1.00 0 0.50 4.771460 0.1836094
100 0.1 10 0.01 0.50 1 0.50 2.781835 0.4968600
100 0.3 10 0.01 0.50 1 0.50 2.940768 0.4308830
100 0.6 10 0.01 0.50 1 0.50 3.764324 0.2376594
100 1.0 10 0.01 0.50 1 0.50 4.903806 0.1331123
100 0.1 10 0.01 0.75 1 0.50 2.794847 0.4892918
100 0.3 10 0.01 0.75 1 0.50 2.812117 0.4936663
100 0.6 10 0.01 0.75 1 0.50 3.591559 0.2891315
100 1.0 10 0.01 0.75 1 0.50 5.148766 0.2204648
100 0.1 10 0.01 1.00 1 0.50 2.709879 0.5305819
100 0.3 10 0.01 1.00 1 0.50 2.849651 0.4876117
100 0.6 10 0.01 1.00 1 0.50 3.655816 0.3653959
100 1.0 10 0.01 1.00 1 0.50 5.332012 0.1441810
100 0.1 10 0.01 0.50 0 0.75 2.796162 0.4925347
100 0.3 10 0.01 0.50 0 0.75 2.966395 0.4278444
100 0.6 10 0.01 0.50 0 0.75 3.534210 0.3699708
100 1.0 10 0.01 0.50 0 0.75 3.357962 0.3203458
100 0.1 10 0.01 0.75 0 0.75 2.768253 0.5013691
100 0.3 10 0.01 0.75 0 0.75 2.908531 0.4579766
100 0.6 10 0.01 0.75 0 0.75 3.034321 0.4254164
100 1.0 10 0.01 0.75 0 0.75 3.745947 0.3261978
100 0.1 10 0.01 1.00 0 0.75 2.932241 0.4515873
100 0.3 10 0.01 1.00 0 0.75 3.275973 0.3675852
100 0.6 10 0.01 1.00 0 0.75 3.193267 0.4152746
100 1.0 10 0.01 1.00 0 0.75 3.624724 0.2879670
100 0.1 10 0.01 0.50 1 0.75 2.731569 0.5085421
100 0.3 10 0.01 0.50 1 0.75 3.208360 0.3786825
100 0.6 10 0.01 0.50 1 0.75 3.268399 0.3403682
100 1.0 10 0.01 0.50 1 0.75 3.973082 0.1989279
100 0.1 10 0.01 0.75 1 0.75 2.868757 0.4660657
100 0.3 10 0.01 0.75 1 0.75 3.077180 0.3986972
100 0.6 10 0.01 0.75 1 0.75 3.634477 0.3738522
100 1.0 10 0.01 0.75 1 0.75 4.252798 0.2106526
100 0.1 10 0.01 1.00 1 0.75 2.873708 0.4643916
100 0.3 10 0.01 1.00 1 0.75 2.903854 0.4839781
100 0.6 10 0.01 1.00 1 0.75 3.028093 0.4108710
100 1.0 10 0.01 1.00 1 0.75 3.616890 0.3374863
100 0.1 10 0.01 0.50 0 1.00 2.860763 0.4614053
100 0.3 10 0.01 0.50 0 1.00 2.883737 0.4542335
100 0.6 10 0.01 0.50 0 1.00 2.978524 0.4270430
100 1.0 10 0.01 0.50 0 1.00 3.651106 0.2391381
100 0.1 10 0.01 0.75 0 1.00 2.878473 0.4659355
100 0.3 10 0.01 0.75 0 1.00 2.622881 0.5473015
100 0.6 10 0.01 0.75 0 1.00 3.196092 0.3579687
100 1.0 10 0.01 0.75 0 1.00 2.999055 0.4336132
100 0.1 10 0.01 1.00 0 1.00 2.980545 0.4443001
100 0.3 10 0.01 1.00 0 1.00 3.134896 0.4295236
100 0.6 10 0.01 1.00 0 1.00 3.068468 0.4344545
100 1.0 10 0.01 1.00 0 1.00 3.764411 0.3404447
100 0.1 10 0.01 0.50 1 1.00 2.661194 0.5332041
100 0.3 10 0.01 0.50 1 1.00 2.789818 0.4897297
100 0.6 10 0.01 0.50 1 1.00 2.985882 0.4413791
100 1.0 10 0.01 0.50 1 1.00 3.492688 0.3601117
100 0.1 10 0.01 0.75 1 1.00 2.902084 0.4551200
100 0.3 10 0.01 0.75 1 1.00 2.863327 0.4763733
100 0.6 10 0.01 0.75 1 1.00 3.425502 0.3596883
100 1.0 10 0.01 0.75 1 1.00 3.480266 0.3049011
100 0.1 10 0.01 1.00 1 1.00 2.980545 0.4443001
100 0.3 10 0.01 1.00 1 1.00 3.134896 0.4295236
100 0.6 10 0.01 1.00 1 1.00 3.068468 0.4344545
100 1.0 10 0.01 1.00 1 1.00 3.764411 0.3404447
Best-Performing Model(s) on the validation dataset
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.3 10 0.01 0.75 0 1 2.622881 0.5473015

Dust Mass AOD XGBoost

Model Performance of XGBoost models on the validation dataset
Model: Predicting Dust Mass using AOD Parameters
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.1 10 0.01 0.50 0 0.50 1.044064 0.1686571
100 0.3 10 0.01 0.50 0 0.50 1.107920 0.1496438
100 0.6 10 0.01 0.50 0 0.50 1.273142 0.0788620
100 1.0 10 0.01 0.50 0 0.50 1.615547 0.0896869
100 0.1 10 0.01 0.75 0 0.50 1.098732 0.1365181
100 0.3 10 0.01 0.75 0 0.50 1.189688 0.0827451
100 0.6 10 0.01 0.75 0 0.50 1.188249 0.1014350
100 1.0 10 0.01 0.75 0 0.50 1.839788 0.0459243
100 0.1 10 0.01 1.00 0 0.50 1.145086 0.1007716
100 0.3 10 0.01 1.00 0 0.50 1.166280 0.1416755
100 0.6 10 0.01 1.00 0 0.50 1.481536 0.0412335
100 1.0 10 0.01 1.00 0 0.50 1.769717 0.0291995
100 0.1 10 0.01 0.50 1 0.50 1.053488 0.1521837
100 0.3 10 0.01 0.50 1 0.50 1.177568 0.0889112
100 0.6 10 0.01 0.50 1 0.50 1.250296 0.1104721
100 1.0 10 0.01 0.50 1 0.50 1.829122 0.0275318
100 0.1 10 0.01 0.75 1 0.50 1.098190 0.1278504
100 0.3 10 0.01 0.75 1 0.50 1.166023 0.0970439
100 0.6 10 0.01 0.75 1 0.50 1.455609 0.0453304
100 1.0 10 0.01 0.75 1 0.50 1.900546 0.0510682
100 0.1 10 0.01 1.00 1 0.50 1.132060 0.1210547
100 0.3 10 0.01 1.00 1 0.50 1.348984 0.0481135
100 0.6 10 0.01 1.00 1 0.50 1.214335 0.1247672
100 1.0 10 0.01 1.00 1 0.50 1.964034 0.0501294
100 0.1 10 0.01 0.50 0 0.75 1.115031 0.1157504
100 0.3 10 0.01 0.50 0 0.75 1.093044 0.1189931
100 0.6 10 0.01 0.50 0 0.75 1.241139 0.0874516
100 1.0 10 0.01 0.50 0 0.75 1.552275 0.0502430
100 0.1 10 0.01 0.75 0 0.75 1.072719 0.1600697
100 0.3 10 0.01 0.75 0 0.75 1.092982 0.1500387
100 0.6 10 0.01 0.75 0 0.75 1.307695 0.0582151
100 1.0 10 0.01 0.75 0 0.75 1.444253 0.0417317
100 0.1 10 0.01 1.00 0 0.75 1.160610 0.1171092
100 0.3 10 0.01 1.00 0 0.75 1.155885 0.1306212
100 0.6 10 0.01 1.00 0 0.75 1.276979 0.0708871
100 1.0 10 0.01 1.00 0 0.75 1.631090 0.0184742
100 0.1 10 0.01 0.50 1 0.75 1.058932 0.1503222
100 0.3 10 0.01 0.50 1 0.75 1.042265 0.1722565
100 0.6 10 0.01 0.50 1 0.75 1.146073 0.0908222
100 1.0 10 0.01 0.50 1 0.75 1.340437 0.0977267
100 0.1 10 0.01 0.75 1 0.75 1.139798 0.1152202
100 0.3 10 0.01 0.75 1 0.75 1.281436 0.0717585
100 0.6 10 0.01 0.75 1 0.75 1.331522 0.0866690
100 1.0 10 0.01 0.75 1 0.75 1.278430 0.0800888
100 0.1 10 0.01 1.00 1 0.75 1.196141 0.1007245
100 0.3 10 0.01 1.00 1 0.75 1.169476 0.1066662
100 0.6 10 0.01 1.00 1 0.75 1.245128 0.1024556
100 1.0 10 0.01 1.00 1 0.75 1.276175 0.0724593
100 0.1 10 0.01 0.50 0 1.00 1.042355 0.1627508
100 0.3 10 0.01 0.50 0 1.00 1.118329 0.1070665
100 0.6 10 0.01 0.50 0 1.00 1.241313 0.0941201
100 1.0 10 0.01 0.50 0 1.00 1.291728 0.0941799
100 0.1 10 0.01 0.75 0 1.00 1.170831 0.1118436
100 0.3 10 0.01 0.75 0 1.00 1.220491 0.0986176
100 0.6 10 0.01 0.75 0 1.00 1.316836 0.0855792
100 1.0 10 0.01 0.75 0 1.00 1.408001 0.0767177
100 0.1 10 0.01 1.00 0 1.00 1.273229 0.0631732
100 0.3 10 0.01 1.00 0 1.00 1.255933 0.0815711
100 0.6 10 0.01 1.00 0 1.00 1.406535 0.0464894
100 1.0 10 0.01 1.00 0 1.00 1.432997 0.0676630
100 0.1 10 0.01 0.50 1 1.00 1.081406 0.1272549
100 0.3 10 0.01 0.50 1 1.00 1.202606 0.0948832
100 0.6 10 0.01 0.50 1 1.00 1.129263 0.1526449
100 1.0 10 0.01 0.50 1 1.00 1.220702 0.0746314
100 0.1 10 0.01 0.75 1 1.00 1.193950 0.0929035
100 0.3 10 0.01 0.75 1 1.00 1.169124 0.0929738
100 0.6 10 0.01 0.75 1 1.00 1.337707 0.0583804
100 1.0 10 0.01 0.75 1 1.00 1.242355 0.0864373
100 0.1 10 0.01 1.00 1 1.00 1.273229 0.0631732
100 0.3 10 0.01 1.00 1 1.00 1.255933 0.0815711
100 0.6 10 0.01 1.00 1 1.00 1.406535 0.0464894
100 1.0 10 0.01 1.00 1 1.00 1.432997 0.0676630
Best-Performing Model(s) on the validation dataset
nrounds eta max_depth gamma colsample_bytree min_child_weight subsample RMSE R2
100 0.3 10 0.01 0.5 1 0.75 1.042265 0.1722565